NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Cleangen: Mitigating backdoor attacks for generation tasks in large language models

Li, Y; Xu, Z; Jiang, F; Niu, L; Sahabandu, D; Ramasubramanian, B; Poovendran, R (November 2024, Conference on Empirical Methods in Natural Language Processing (EMNLP))

The remarkable performance of large language models (LLMs) in generation tasks has enabled practitioners to leverage publicly available models to power custom applications, such as chatbots and virtual assistants. However, the data used to train or fine-tune these LLMs is often undisclosed, allowing an attacker to compromise the data and inject backdoors into the models. In this paper, we develop a novel inference time defense, named CLEANGEN, to mitigate backdoor attacks for generation tasks in LLMs. CLEANGEN is a lightweight and effective decoding strategy that is compatible with the state-of-the-art (SOTA) LLMs. Our insight behind CLEANGEN is that compared to other LLMs, back doored LLMs assign significantly higher probabilities to tokens representing the attacker-desired contents. These discrepancies in token probabilities enable CLEANGEN to identify suspicious tokens favored by the attacker and replace them with tokens generated by another LLM that is not compromised by the same attacker, thereby avoiding generation of attacker-desired content. We evaluate CLEANGEN against five SOTA backdoor attacks. Our results show that CLEANGEN achieves lower attack success rates (ASR) compared to five SOTA baseline defenses for all five backdoor attacks. Moreover, LLMs deploying CLEANGEN maintain helpfulness in their responses when serving benign user queries with minimal added computational overhead.
more » « less
Full Text Available
ACE: A model poisoning attack on contribution evaluation methods in federated learning

Xu, Z; Jiang, F; Niu, L; Jia, J; Li, Bo; Poovendran, Radha (August 2024, 33rd USENIX Security Symposium (USENIX Security 24))

Full Text Available
SafeDecoding: Defending against jailbreak attacks via safety-aware decoding

Xu, Z; Jiang, F; Niu, L; Jia, J; Li, Bo; Poovendran, Radha (August 2024, Annual Meeting of the Association for Computational Linguistics (ACL))

Full Text Available
ArtPrompt: ASCII art-based jailbreak attacks against aligned LLMs

Jiang, F; Xu, Z; Niu, L; Xiang, Z; Li, Bo; Poovendran, Radha (August 2024, Annual Meeting of the Association for Computational Linguistics (ACL))

Full Text Available
ArtPrompt: ASCII art-based jailbreak attacks against aligned LLMs

Jiang, F; Xu, Z; Niu, L; Xiang, Z; Li, Bo; Poovendran, Radha (August 2024, Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 15157–15173)

Safety is critical to the usage of large language models (LLMs). Multiple techniques such as data filtering and supervised fine tuning have been developed to strengthen LLM safety. However, currently known techniques presume that corpora used for safety alignment of LLMs are solely interpreted by semantics. This assumption, however, does not hold in real-world applications, which leads to severe vulnerabilities in LLMs. For example, users of forums often use ASCII art, a form of text-based art, to convey image information. In this paper, we propose a novel ASCII art-based jailbreak attack and introduce a comprehensive benchmark Vision-in-Text Challenge (VITC) to evaluate the capabilities of LLMs in recognizing prompts that cannot be solely interpreted by semantics. We show that five SOTA LLMs (GPT-3.5, GPT-4, Gemini, Claude, and Llama2) struggle to recognize prompts provided in the form of ASCII art. Based on this observation, we develop the jailbreak attack ArtPrompt, which leverages the poor performance of LLMs in recognizing ASCII art to bypass safety measures and elicit undesired behaviors from LLMs. ArtPrompt only requires black-box access to the victim LLMs, making it a practical attack. We evaluate ArtPrompt on five SOTA LLMs, and show that ArtPrompt can effectively and efficiently induce undesired behaviors from all five LLMs. Our code is available at https: //github.com/uw-nsl/ArtPrompt.
more » « less
Full Text Available
Poster: Brave: Byzantine-resilient and privacy-preserving peer-to-peer federated learning

Xu, Z; Jiang, F; Niu, L; Jia, J; Li, Bo; Poovendran, Radha (July 2024, In Proceedings of the 19th ACM Asia Conference on Computer and Communications Security (pp. 1934-1936).)

Federated learning (FL) enables multiple participants to train a global machine learning model without sharing their private training data. Peer-to-peer (P2P) FL advances existing centralized FL paradigms by eliminating the server that aggregates local models from participants and then updates the global model. However, P2P FL is vulnerable to (i) honest-but-curious participants whose objective is to infer private training data of other participants, and (ii) Byzantine participants who can transmit arbitrarily manipulated local models to corrupt the learning process. P2P FL schemes that simultaneously guarantee Byzantine resilience and preserve privacy have been less studied. In this paper, we develop Brave, a protocol that ensures Byzantine Resilience And priVacy-prEserving property for P2P FL in the presence of both types of adversaries. We show that Brave preserves privacy by establishing that any honest-but-curious adversary cannot infer other participants’ private data by observing their models. We further prove that Brave is Byzantine-resilient, which guarantees that all benign participants converge to an identical model that deviates from a global model trained without Byzantine adversaries by a bounded distance. We evaluate Brave against three state-of-the-art adversaries on a P2P FL for image classification tasks on benchmark datasets CIFAR10 and MNIST. Our results show that global models learned with Brave in the presence of adversaries achieve comparable classification accuracy to global models trained in the absence of any adversary.
more » « less
Full Text Available
SafeDecoding: Defending against jailbreak attacks via safety-aware decoding

Xu, Z; Jiang, F; Niu, L; Jia, J; Li, Bo; Poovendran, Radha (May 2024, ICLR Workshop on Secure and Trustworthy Large Language Models (ICLR SeT-LLM))

As large language models (LLMs) become increasingly integrated into real-world applications such as code generation and chatbot assistance, extensive efforts have been made to align LLM behavior with human values, including safety. Jailbreak attacks, aiming to provoke unintended and unsafe behaviors from LLMs, remain a significant LLM safety threat. In this paper, we aim to defend LLMs against jailbreak attacks by introducing SafeDecoding, a safety-aware decoding strategy for LLMs to generate helpful and harmless responses to user queries. Our insight in developing SafeDecoding is based on the observation that, even though probabilities of tokens representing harmful contents outweigh those representing harmless responses, safety disclaimers still appear among the top tokens after sorting tokens by probability in descending order. This allows us to mitigate jailbreak attacks by identifying safety disclaimers and amplifying their token probabilities, while simultaneously attenuating the probabilities of token sequences that are aligned with the objectives of jailbreak attacks. We perform extensive experiments on five LLMs using six state-of-the-art jailbreak attacks and four benchmark datasets. Our results show that SafeDecoding significantly reduces attack success rate and harmfulness of jailbreak attacks without compromising the helpfulness of responses to benign user queries while outperforming six defense methods. Our code is publicly available at: https://github.com/uw-nsl/SafeDecoding
more » « less
Full Text Available
Robust inference for change points in high dimension

Jiang, F; Wang, R; Shao, X (January 2023, Journal of Multivariate Analysis)
Rosen, D (Ed.)
This paper proposes a new test for a change point in the mean of high-dimensional data based on the spatial sign and self-normalization. The test is easy to implement with no tuning parameters, robust to heavy-tailedness and theoretically justified with both fixed-and sequential asymptotics under both null and alternatives, where n is the sample size. We demonstrate that the fixed-n asymptotics provide a better approximation to the finite sample distribution and thus should be preferred in both testing and testing-based estimation. To estimate the number and locations when multiple change-points are present, we propose to combine the p-value under the fixed-n asymptotics with the seeded binary segmentation (SBS) algorithm. Through numerical experiments, we show that the spatial sign based procedures are robust with respect to the heavy-tailedness and strong coordinate-wise dependence, whereas their non-robust counterparts proposed in Wang et al. (2022) [28] appear to under-perform. A real data example is also provided to illustrate the robustness and broad applicability of the proposed test and its corresponding estimation algorithm.
more » « less
Full Text Available
Time series analysis of COVID-19 infection curve: A change-point perspective

Jiang, F.; Zhao, Z.; Shao, X. (January 2023, Journal of econometrics)
Chen, X.; Todorov, V. (Ed.)
Full Text Available
Segmenting time series via self-normalisation

Zhao, Z.; Jiang, F.; Shao, X. (January 2022, Journal of the Royal Statistical Society Series B Methodological)
Delaigle, A.; Lauritzen, S.; Yao, Q. (Ed.)
Full Text Available

« Prev Next »

Search for: All records